Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/IB/MLX5/DC: introducing dcs_hybrid policy #10138

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

roiedanino
Copy link
Contributor

@roiedanino roiedanino commented Sep 9, 2024

What

Introducing a new dci allocation policy - dcs_hybrid which will behave the same as dcs_quota except using a dedicated HW dci when available instead of waiting for a dci from the pool to be available

Why ?

Utilizing hw resources instead of waiting for another dci, improves latency for certain message sizes

How ?

In case dci_alloc_or_create returned 0 - use the dedicated HW dci with a new dci channel.

@roiedanino roiedanino self-assigned this Sep 9, 2024
src/uct/ib/mlx5/dc/dc_mlx5_ep.c Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
Comment on lines 373 to 374
uct_dc_mlx5_dci_pool_init_dci(iface, uct_dc_mlx5_ep_pool_index(ep),
UCT_DC_MLX5_HW_DCI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean HW DCS is the first DCI, that is most likely to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't mean that, because in alloc_or_create we are allocating by the length of dcis array:
image

src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
src/uct/ib/mlx5/dc/dc_mlx5_ep.h Outdated Show resolved Hide resolved
Comment on lines 711 to 719
if (!uct_dc_mlx5_iface_is_hybrid(iface)) {
return UCS_ERR_NO_RESOURCE;
}

if (ep->dci == UCT_DC_MLX5_HW_DCI) {
return UCS_OK;
}

if (uct_dc_mlx5_iface_dci_has_tx_resources(iface, UCT_DC_MLX5_HW_DCI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to reduce number of branches on the fast path?

@roiedanino
Copy link
Contributor Author

Add hw_dci attribute to iface which will be -1 if hybrid was not selected

Copy link
Contributor

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create some DC specific gtests for this new mode?

@@ -901,7 +903,7 @@ uct_dc_mlx5_iface_dcis_create(uct_dc_mlx5_iface_t *iface,

ucs_array_length(&iface->tx.dcis) = 0;

status = uct_dc_mlx5_iface_create_dci(iface, 0, 0);
status = uct_dc_mlx5_iface_create_dci(iface, 0, 0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why just one channel and not self->tx.num_dci_channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter as this dci is only created to query bb_max and then destroyed (lines 913-914)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic, probably need to change the name of this func, as it is quite confusing currently

src/uct/ib/mlx5/dc/dc_mlx5_ep.h Show resolved Hide resolved
Comment on lines 725 to 726
if (uct_dc_mlx5_iface_is_dci_shared(iface) ||
uct_dc_mlx5_is_hw_dci(iface, ep->dci)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems these two check are frequently used together. separate func is worth creating

Comment on lines 341 to 342
return uct_dc_mlx5_iface_is_dci_shared(iface) ||
uct_dc_mlx5_is_hw_dci(iface, dci);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we optimize this to be a single branch? it is used in uct_dc_mlx5_iface_dci_get

Copy link
Contributor Author

@roiedanino roiedanino Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess replacing || with +?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both these checks are checking the policy field, maybe we can a bitmap based check or something like "policy >= UCT_DC_TX_POLICY_DCS_HYBRID" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_hw_dci just checks dci == iface->tx.hybrid_hw_dci so >= DCS_HYBRID won't help, we can just use bitwise | between is_shared and is_hw_dci if that's somehow better

@@ -212,6 +212,7 @@ typedef struct uct_dc_dci {
uint8_t path_index; /* Path index */
uint8_t next_channel_index; /* next DCI channel index
to be used by EP */
uint8_t is_shared; /* Indicates that this specific dci is shared, regardless of policy*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. space before *
  2. line seems too long
  3. maybe make it uint8_t flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can't avoid that extra branch because now we don't know if dci is UCT_DC_MLX5_EP_NO_DCI, resulting in sanitizer failure (accessing dcis[255])

@@ -366,6 +365,8 @@ uct_dc_mlx5_dci_pool_init_dci(uct_dc_mlx5_iface_t *iface, uint8_t pool_index,
return status;
}

dci->is_shared = uct_dc_mlx5_iface_is_policy_shared(iface) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set it later - initialize according to fields order in the struct

src/uct/ib/mlx5/dc/dc_mlx5.h Show resolved Hide resolved
ucs_arbiter_t *waitq;
uct_rc_txqp_t *txqp;
int16_t available;

ucs_assert(!iface->super.super.config.tx_moderation);

if (uct_dc_mlx5_iface_is_dci_shared(iface)) {
if (ep->dci == UCT_DC_MLX5_EP_NO_DCI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove assert+comment in line 745?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also update or remove the comment in that line

@roiedanino roiedanino merged commit 13af469 into openucx:master Sep 30, 2024
141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants